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Abstract 

We examine the recovery of block sparse signals and extend the framework in two important 
directions; one by exploiting signals' intra-block correlation and the other by generalizing signals' block 
structure. We propose two families of algorithms based on the framework of block sparse Bayesian 
learning (BSBL). One family, directly derived from the BSBL framework, requires knowledge of the 
block structure. Another family, derived from an expanded BSBL framework, is based on a weaker 
assumption on the block structure, and can be used when the block structure is completely unknown. 
Using these algorithms we show that exploiting intra-block correlation is very helpful in improving 
recovery performance. These algorithms also shed light on how to modify existing algorithms or design 
new ones to exploit such correlation and improve performance. 

Index Terms 

Sparse Signal Recovery, Compressed Sensing, Block Sparse Model, Sparse Bayesian Learning (SBL), 
Intra-Block Correlation 

I. Introduction 

Sparse signal recovery and the associated problems of compressed sensing have received much attention 
in recent years [1]. The basic model is given by 

y = $x + v, (1) 
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where y G M Mxl is a known measurement vector, <i> G R MxAr (M <C iV) is a known matrix (generally 
called the sensing matrix) and any M columns are linearly independent, x G ]R Arxl is a sparse signal 
which we want to recover, and v is a noise vector. In applications, x generally has additional structure. 
A widely studied structure is block/group structure [2]-[4]. With this structure, x can be viewed as a 
concatenation of g blocks, i.e., 

x = [xi, • • • ,x dl ,--- ,x d +i, • • • ,x dg ] T (2) 
v v ' ^— v > 

x i x 3 

where dj(Vz) are not necessarily identical. Among the g blocks, only k (k <C g) blocks are nonzero but 
their locations are unknown. It is known that exploiting such block partition can further improve recovery 
performance. 

A number of algorithms have been proposed to recover sparse signals with the block structure. Typical 
algorithms include Model-CoSaMp [3], Block-OMP [4], and Group-Lasso type algorithms such as the 
original Group Lasso algorithm [2], Group Basis Pursuit [5], and the Mixed £2/^1 Program [6]. These 
algorithms require knowledge of the block partition ©. Other algorithms such as StructOMP [7], do 
not need to know the block partition but need to know other a priori information, e.g., the number of 
nonzero elements in x. Recently, CluSS-MCMC [8] and BM-MAP-OMP [9] have been proposed, which 
require very little a priori knowledge. 

However, few existing algorithms consider intra-block correlation, i.e., the correlation among ampli- 
tudes of the elements within each block. In practical applications the intra-block correlation widely exists, 
such as physiological signals [10] and images. In this work we derive several algorithms that explore and 
exploit the intra-block correlation to improve performance. These algorithms are based on our recently 
proposed block sparse Bayesian learning (BSBL) framework [11]. Although the framework was initially 
used to derive algorithms for multiple measurement vector (MMV) models [12], it has not been used 
for the block sparse model (Q])-©. The successes of sparse Bayesian learning methods in past contexts 
motivate us to consider their extension to this problem and fill this gap. 

One contribution of our work is that the proposed algorithms are the first ones in the category 
that adaptively explore and exploit the intra-block correlation. Experiments showed that the developed 
algorithms significantly outperform competitive algorithms. We also suggest a promising strategy to 
incorporate the intra-block correlation in the Group-Lasso type algorithms to improve their performance. 

Another contribution is the insight into the effect of the intra-block correlation on algorithms' perfor- 
mance. An MMV model can be viewed as a special case of a block sparse model. But we found the 
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effect of the intra-block correlation on algorithmic performance is quite different from the effect of the 
temporal correlation [11]. 

The third contribution is the development of a simple approximate model and corresponding algorithms 
to solve the problem when the block partition is entirely unknown. These algorithms are effective 
especially in noisy environments. 

In this paper bold symbols are reserved for vectors and matrices. For square matrices Ai, • • • , A g , 
diag{Ai, • • • , A g } denotes a block diagonal matrix with principal diagonal blocks being Ai, • • • , A g in 
turn. Tr(A) denotes the trace of A. 7 y means each element in the vector 7 is nonnegative. 

Parts of this work have been published in [13]. 

II. Overview of the BSBL Framework 

This section briefly describes the BSBL framework [11], upon which we develop our algorithms. In 
this framework, each block Xj € M diXl is assumed to satisfy a parameterized multivariate Gaussian 
distribution: 

p(xi;7i,Bi) ~ jV(0,7iBj), i = l,---,g 

with the unknown parameters 7, and Bj. Here 7$ is a nonnegative parameter controlling the block-sparsity 
of x. When 7$ = 0, the i-th block becomes zero. During the learning procedure most 7$ tend to be 
zero, due to the mechanism of automatic relevance determination [14]. Thus sparsity at the block level is 
encouraged. Bj 6 M. dzXdi is a positive definite matrix, capturing the correlation structure of the i-th block. 
Under the assumption that blocks are mutually uncorrected, the prior of x is p(x; {74, Bj}j) ~ M(0, So)> 
where So = diag{7iBi, • • • , 7 3 B 9 }. Assume the noise vector satisfies p(v; A) ~ Af(0, AI), where A is 
a positive scalar. Therefore the posterior of x is given by p(x|y; A, {7^, Bj}f =1 ) = Af(fi x ,'S x ) with 
fi x = £ $ T (AI + *S * T )" 1 y and S x = (Sq 1 + i* 7 *)" 1 . Once the parameters A, {7^ Bj}f =1 are 
estimated, the Maximum-A-Posteriori (MAP) estimate of x, denoted by x, can be directly obtained from 
the mean of the posterior, i.e., 

x<- S * T (AI + *S * T ) -1 y- 

The parameters can be estimated by a Type II maximum likelihood procedure [14]. This is equivalent 
to minimizing the following cost function [11] 

£(0) = log |AI + *S * T | + y T (AI + *£o* T rV, (3) 
where G denotes all the parameters A, {71, Bj}f =1 . This framework is called the BSBL framework [11]. 
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Each algorithm derived from this framework includes three learning rules, i.e., the learning rules for 
7i, Bj, and A. The learning rule for 7, is the main body of an algorithm. Different 7, learning rules lead 
to different convergence speed [j , and determine the best possible recovery performance when optimal 
values of A and Bj are given. 

The A learning rule is important as well. If an optimal (or a good sub-optimal) value for A cannot be 
obtained, the recovery performance can be very poor even if the 7, learning rule could potentially lead 
to perfect recovery performance. 

As for Bj(Vi), it can be shown [11] that in noiseless environments, the global minimum of ® always 
leads to the true sparse solution irrespective of the value of B^; Bj only affects local convergence (such 
as changing the shape of the basins of attraction of local minima). Therefore, one can impose various 
constraints on the form of Bj to achieve better performance and prevent overfitting. 

An interesting property of the framework is that it is capable of directly recovering less-sparse or 
non-sparse signals as shown in [10]. 



In this section we propose three algorithms, which require knowledge of the block partition Q. 

A. BSBL-EM: the EM Method 

This algorithm can be readily derived from our previous work [11] on MMV models with suitable 
adaptation. Thus we omit details on algorithm derivation. However, several necessary changes, particularly 
for enhancing the robustness of the learning rules for A and Bj, have to be made here. 

Following the Expectation Maximization (EM) method [11], we can derive the learning rules for 7, 
and A: 



where fi x € M rfiXl is the corresponding i-th block in fi x , and H x 6 M dsXdi is the corresponding i-th 
principal diagonal block in ~E X . Note that the A learning rule (T5]) is not robust in low SNR cases. By 
numerical study, we empirically find that this is due in part to the disturbance caused by the off-block- 
diagonal elements in ~S X and <& T <1>. Therefore, we set their off-block-diagonal elements to zero, leading 

'The A learning rule also affects the speed, but its effect is not dominant. 



III. Algorithms When the Block Partition is Known 




(4) 



(5) 
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to the learning rule 



A < M ' (6) 



where 3>* € R Mxdi is the submatrix of which corresponds to the i-th block of x. This A learning 
rule is better than ([5]) in generally noisy environments (e.g., SNR < 20dB). In noiseless cases there is 
no need to use any A learning rules. Just fixing A to a small value, e.g., 10~ 10 , can yield satisfactory 
performance. 

Similar to [11], using the EM method we can derive a learning rule for Bj. However, assigning a 
different Bj to each blcok can result in overfitting. When blocks have the same size, an effective strategy 
to avoid the overfitting is parameter averaging [11], i.e., constraining Bj = B(Vi). Using this constraint, 
the learning rule for B can be derived as follows 

B i^ ^ + /4(/4) T (7) 

However, the algorithm's performance can be improved by further constraining the matrix B. The idea 
is to find a positive definite and symmetric matrix B such that it is determined by one parameter but is 
close to B especially along the main diagonal and the main sub-diagonal. Further, we find that for many 
applications modeling elements of a block as a first-order Auto-Regressive (AR) process is sufficient to 
model the intra-block correlation. In this case, the corresponding correlation matrix of the block is a 
Toeplitz matrix with the following form: 



Toeplitz([l,r, ••• ,r d ~ 1 ]) 



rp ... ry*d 1 



„d— 1 r d—2 . . . ^ 



(8) 



where r is the AR coefficient and d is the block size. Here we constrain B to have this form. Instead 
of estimating r from the BSBL cost function, we empirically calculate its value by r = ^a, where tjiq 
(resp. mi) is the average of the elements along the main diagonal (resp. the main sub-diagonal) of the 
matrix B in ©. 

When blocks have different sizes, the above idea can still be used. First, using the EM method we 
can derive the rule for each Bj: Bj <— ^ [X* + fi l x (^i l x ) T ] . Then, for each Bj we calculate the averages 
of the elements along the main diagonal and the main sub-diagonal, i.e., and m\, respectively, and 
average m\ and m\ for all blocks as follows: tuq = Yli=i m o an d ^i — Sf=i m \- Finally, we have 



A mi f rom w hich we construct B, for the i-th block: 



Bi = Toeplitz([l,r, • • • ,r a '~ i }) (Vi) (9) 
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We denote the above algorithm by BSBL-EM. 

B. BSBL-BO: the Bound-Optimization Method 

The BSBL-EM algorithm has satisfactory recovery performance but is slow. This is mainly due to the 
EM based 7$ learning rule. For the basic SBL algorithm, Tipping [14] derived a fixed-point based 7$ 
learning rule to replace the EM based one, which has faster convergence speed but is not robust in some 
noisy environments. Here we derive a fast 7, learning rule based on the bound-optimization method (also 
known as the Majorization-Minimization method) [1], [15]. The algorithm adopting this 7$ learning rule 
is denoted by BSBL-BO (it uses the same learning rules for Bj and A as BSBL-EM). It not only has 
fast speed, but also has satisfactory performance. 

Note that the original cost function © consists of two terms. The first term log |AI + 3>£o3? T | is 
concave with respect to 7 y 0, where 7 = [71, • • • ,7 9 ] T - The second term y T (AI + <&5]o^ T ) -1 y is 
convex with respect to 7 y 0. Since our goal is to minimize the cost function, we choose to find an 
upper-bound for the first item and then minimize the upper-bound of the cost function. 

We use the supporting hyperplane of the first term as its upper-bound. Let 7* be a given point in the 
7-space. We have 

log|AI + $£ $ T | < log|AI + *5^* T | 

+ ^Tr((S;)- 1 ^B J (^) T )(7 4 -7*) 

i=l 

= ^^((^-^^(^n^ + iogis;! 
1=1 

-J]TV((Sp- 1 *'B i (* i ) T )7* (10) 

i=i 

where S* = AI + *Sq* t and Sq = X | 7=7 *. Substituting ( fTOl ) into the cost function © we have 

£(7) < ^((sp-^B^fH 

i=l 

+y T (AI + *£ * T rV + log 

-£Tr((£;)- 1 *'B i (* i ) T >rf 

i=i 

= c(rr) (ii) 

The function £(7) is convex over 7, and when 7 = 7* we have £(7*) = £(7*)- Further, for any 7 min 
which minimizes £(7), we have the following relationship: C(j min ) < C(j mhl ) < £(7*) = £(7*). 
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This indicates that when we minimize the surrogate function £(7) over 7, the resulting minimum point 
effectively decreases the original cost function £(7). We can use any optimization software to optimize 
(fTTT) . However, our experiments showed that this takes more time than BSBL-EM and leads to poorer 
recovery performance. Therefore, we consider another surrogate function. Using the identity 



y T (AI + *S * T )~V = min [i||y - $x||| + x T £ Q *x] 



(12) 



where the optimal x is \i x , we have 



£(7) = min — lly — «&x||| + x T S n 1 x 
x A 



1=1 

-X)^((=;)- 1 * i B i (* i ) T )'>?- 



Then, a new function 



Q(j,x) = ^||y-*x||| + x T S 1 x 

i=l 

+ log|S;|-^Tr((S*)- 1 * i B i (^) T ) 7 * 

i=i 

is defined, which is the upper-bound of £(7). Note that (7(7, x) is convex in both 7 and x. It can be 
easily shown that the solution (7 ) of £(7) is the solution (7°,x°) of £/(7,x). Thus, ^(7,x) is our 
final surrogate cost function. 

Taking the derivative of Q with respect to 7 j, we can obtain 



11 v ,& ((* < ) t ( s s)" 1 * <b o" 

Due to this 7 j learning rule, BSBL-BO requires far fewer iterations than BSBL-EM, but both algorithms 
have comparable performance. 

C. BSBL-£\: Hybrid of BSBL and Group-Lasso Type Algorithms 

Essentially, BSBL-EM and BSBL-BO operate in the 7-space, since their cost function is a function 
of 7. In contrast, most existing algorithms for the block sparse model ([I])-© directly operate in the 
x-space, minimizing a data fit term and a penalty, which are both functions of x. It is interesting to see 
the relation between our BSBL algorithms and those algorithms. 
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Using the idea we presented in [16], an extension of the duality space analysis for the basic SBL 
framework [17], we can transform the BSBL cost function ([3]) from the 7-space to the x-space. Since A 
and Bj(Vz) can be viewed as regularizes, for convenience we first treat them as fixed values. 

First, using the identity (fl2l) we can upper-bound the BSBL cost function as follows: 



£(x, 7) = log |AI + $£ * T | + -||y - *x||| + x 2 ^ x x. 

A 

By first minimizing over 7 and then minimizing over x, we have: 

x = argmin |||y - *x||§ + Ag^x) j, (14) 
with the penalty <7 c (x) given by 

g c (x) =min{x T £o 1 x + log|AI + *£ * T |)- (15) 
Define h(-f) = log |AI + *£ * T |- It is concave and non-decreasing w.r.t. 7^0. Thus we have 

log|AI + $£ * T | = minz T 7 - /i*(z) (16) 

z^O 

where h*(z) is the concave conjugate of h(~{) and can be expressed as h*(z) = min 7 ^o z T 7 — log |AI + 
3>£o<& T |. Thus, using (fl6l ) we can express dT5b as 

g c (x) = min x^Sr^x + z^7 — h*(z) 

= minVf ^ +^-fe^z). (17) 

Minimizing (fTTT ) over 7$, we have 



1i = Z i V^ 1 ^ ( V *) 

Substituting £l8]> into d]) leads to 



«7 c (x) = minJ2 M V x ^ B ^ lx " h *&- ( 19 ) 

— i 

Using ( fl9l) , the problem ([141 now becomes: 



argmin ||y — ^x||| 



+A 



(20) 
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To further simplify the expression, we now calculate the optimal value of z? . However, we need not 
calculate this value from the above expression. According to the duality property, from the relation (fl6l) 
we can directly obtain the optimal value as follows: 

| / glog]AI+jgojg] \l 

h ~ V d % ) 

= (Tr[B i $ iT (AI + *S * T )" 1 * i ]) i . (21) 

Note that z,- L is a function of 7, while according to ( fT8l ) 7^ is a function of Xj (and Zi). This means that 
the problem (1201 ) should be solved in an iterative way. In the fc-th iteration, having used the update rules 
( fT8l ) and (|2TT) to obtain (z^) 1 / 2 , we need to solve the following optimization problem: 

x( fc+1 ) = argmin ||y - *x||l + A ^ } yjxj Br 1 ^, (22) 

j 

where w[ = 2(,z,|'^) 1 / 2 . And the resulting x( fe+1 ) will be used to update 73 and «j, which are in turn 
used to calculate the solution in the next iteration. 

The solution to (1221 can be calculated using any Group-Lasso type algorithm. To see this, let Uj = 
w m B -i/2 x . 5 u a [ u t . . . )U T]T and H = * • disig{B^ 2 /wf\ ■ ■ ■ ,B l J 2 /wf ] }. Then the problem 
(l22l can be transformed to the following one: 

u ( fc+1 ) = argmin lly - Hull? + A V ||uJ| 2 . (23) 

u ^ ' 

i 

Now each iteration is a standard Group-Lasso type problem, while the whole algorithm is an iterative 
reweighted algorithm. 

In the above development we did not consider the learning rules for the regularizes A and B/. In fact, 
their estimation greatly benefits from this iterative reweighted form. Since each iteration is a Group-Lasso 
type problem, the optimal value of A can be automatically selected in the Group Lasso framework [18]. 
Also, since each iteration provides a block sparse solution, which is close to the true solution, Bj can 
be directly estimated from the solution of the previous iteration. In particular, each nonzero block in the 
previous solution can be treated as an AR(1) process, and its AR coefficient is thus estimated. The AR 
coefficients associated with all the nonzero blocks are averaged and the average value, denoted by f, 
is used to construct each Bj according to ([9]). 

The above algorithm is denoted by BSBL-£i. It can be seen as a hybrid of a BSBL algorithm and 
a Group-Lasso type algorithm. On the one hand, it has the ability to adaptively learn and exploit intra- 
block correlation for better performance, as BSBL-EM and BSBL-BO. On the other hand, since it only 

2 The averaging is important. Otherwise, the algorithm may have poor performance. 
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takes few iterations (generally about 2 to 5 iterations in noisy environments) and each iteration can be 
implemented by any efficient Group-Lasso type algorithm, it is much faster and is more suitable for 
large-scale datasets than BSBL-EM and BSBL-BO. 

The algorithm also provides insights if we want to equip Group-Lasso type algorithms with the ability to 
exploit intra-block correlation for better recovery performance. We can consider this iterative reweighted 
method and change the £2 norm of Xj, i.e., ||xj||2, to the Mahalanobis distance type measure ^x^BT^Xj. 

IV. Algorithms When the Block Partition is Unknown 

Now we extend the BSBL framework to address the situation when the block partition is unknown. 
For the algorithm development, we assume that all the blocks are of equal size h and the nonzeros blocks 
are arbitrarily located. Later we will see that the approximation of equal block size is not limiting. Note 
that though the resulting algorithms are not very sensitive to the choice of h, algorithmic performance 
can be further improved if a suitable value of h is selected. We will comment more on h later. 

Given the identical block size h, there are p = N — h + 1 possible (overlapping) blocks in x. The i-th 
block starts at the i-th element of x and ends at the (i + h — l)-th element. All the nonzero elements of 
x lie within a subset of these blocks. Similar to Section JIIJ for the i-th block, we assume it satisfies a 
multivariate Gaussian distribution with the mean given by and the covariance matrix given by 7jBj, 
where B; € R hxh . So the prior of x has the form: p(x) ~ N x {0, So). Note that due to the overlapping 
locations of these blocks, So is no longer a block diagonal matrix. It has the structure that each 7jBi 
lies along the principal diagonal of So and overlaps other neighboring 7_jBj(j ^ i). Thus, we cannot 
directly use the BSBL framework and need to make some modifications. 

To facilitate the use of the BSBL framework, we expand the covariance matrix So as follows: 

S = diag{ 7l B 1 , • • • , 7p B p } € R phx P h (24) 

Note that 7jBj no longer overlaps other ^ j). The definition of So implies the following 

decomposition of x: 

v 

x = ^E iZi , (25) 

where Zj G R hxl , E{zi} = 0, E'jzjzJ} = <5jj7jBj (5ij = 1 if i = j; otherwise, 8ij = 0), and 
z = [zj, ■ ■ ■ ,Zp] T ~ A/"z(0, So). Ej € M. Nxh is a zero matrix except that the part from its z-th row to 

(i + h — l)-th row is replaced by the identity matrix I. Then the original model (Q]) can be expressed as: 

p 

y = Y^ ® E i z i + v = Az + V, (26) 

1=1 
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where A = [Ai, • • • , A p ] with Aj = 3>Ej. Now the new model (l26l ) is a block sparse model and can 
be solved by the BSBL framework. Thus, following the development of BSBL-EM, BSBL-BO, and 
BSBL-^i, we obtain algorithms for this expanded model, which are called EBSBL-EM, EBSBL-BO, 
and EBSBL-^i, respectively. 

In the derivation above we assume that all blocks have the equal known size, h. However, this 
assumption is not crucial for practical use. When the size of a nonzero block of x, say xy, is greater 
than or equal to h, it can be recovered by a set of (overlapped) Zj (i 6 S, S is a non-empty set). When 
the size of Xj is less than h, it can be recovered by a Zj for some i. In this case, since the size of z, 
is greater than the size of Xj, the elements in Zj which do not overlap with elements of Xj are very 
close to zero. The experiments in Section [V] and in [13] show that different values of h lead to similar 
performance. 

The above insight also implies that even if the block partition is unknown, one can partition a signal 
into a number of non-overlapping blocks with user-defined block sizes, and then perform the BSBL 
algorithms. Nonetheless, performance of the BSBL algorithms are generally more sensitive to the block 
sizes than the EBSBL algorithms when recovering block sparse signals [19] 0. 

Use of the expanded model when the block partition is unknown is quite different from existing 
approaches [7]-[9]. Our new approach has several advantages. Firstly, it simplifies the algorithms, which, 
in turn, increases robustness in noisy environments, as shown in Section[V] Secondly, it facilitates exploita- 
tion of intra-block correlation. Intra-block correlation is common in practical applications. Exploiting such 
correlation can significantly improve performance, yielding an advantage to our approach over existing 
methods which ignore intra-block correlation. 

V. Experiments 

Due to space limitations, we only present some representative experimental results based on computer 
simulations 0. Experiments on real-world data can be found in [10]. 

In the following, each experiment was repeated for 400 trials. In each trial the matrix <1? was gen- 
erated as a zero mean random Gaussian matrix with columns normalized to unit £2 norm. In noisy 
experiments the Normalized Mean Square Error (NMSE) was used as a performance index, defined by 
||x — XgenlH/IIXgenlll, where x was the estimate of the true signal x gcn . In noiseless experiments the 

3 When directly recovering non-sparse signals, performance of the BSBL algorithms are not sensitive to block sizes [10]. 
4 Matlab codes can be downloaded at |http ://dsp.ucsd.edu/~zhilin/BSBL. html| 
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(a) Intra-Block Correlation: 



(b) Intra-Block Correlation: 0.95 



Fig. 1. Empirical 99% phase transitions of all the algorithms (a) when the intra-block correlation was 0, and (b) when the 
intra-block correlation was 0.95. Each point on a phase transition curve corresponds to the success rate larger than or equal to 
0.99. 



success rate was used as a performance index, denned as the percentage of successful trials in the 400 
trials (A successful trial was defined as the one when NMSE < 10 -5 ). 

In noiseless experiments, we chose the Mixed ^2/^1 Program [6] to solve (|23T ) in each iteration 
of BSBL-£i; in noisy experiments, we chose the Group Basis Pursuit for this purpose. For all of 
our algorithms, when calculating r, instead of using the original formula r = the formula r = 
sign(^) min{|2^-|, 0.99} was used to ensure that the calculated r satisfies — 1 < r < 1. The same 
modification applies to f. 

A. Phase Transition 

We first examined empirical phase transitions [20] |f| in exact recovery of block sparse signals in 
noiseless environments for our three BSBL algorithms, Block-OMP, Model-CoSaMP, the Mixed £2/^1 
Program, and Group Basis Pursuit. The phase transition is generally used to illustrate how sparsity level 
(defined as p = K/M, where K is the number of nonzero elements in x) and indeterminacy (defined as 
5 = M/N) affect each algorithm's success in exact recovery of sparse signals. Each point on the plotted 

5 The phase transition graph was initially used to describe each algorithm's ability to recover a sparse signal with no structure. 
In this experiment it was used to describe each algorithm's ability to recover a sparse signal with a block structure. 
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phase transition curve corresponds to an algorithm's success rate greater than or equal to 99% in 400 
trials. Above the curve the success rate sharply drops. 

In the experiment we varied the indeterminacy 5 = M/N from 0.05 to 0.5 with N fixed to 1000. For 
each M and N, a block sparse signal was generated, which consisted of 40 blocks with an identical block 
size of 25 elements. The number of nonzero blocks varied from 1 to 20; thus the number of nonzero 
elements varied from 25 to 500. The locations of the nonzero blocks were determined randomly. The block 
partition was known to the algorithms, but the number of nonzero blocks and their locations were unknown 
to the algorithms. Each nonzero block was generated by a multivariate Gaussian distribution with zero 
mean and covariance matrix S gcn . By manipulating the covariance matrix, and thus changing intra-block 
correlation, we examined the effect of intra-block correlation on each algorithm's phase transition. 

We first considered the situation when the intra-block correlation was (i.e., Sg en = I)- The empirical 
phase transition curves of all the algorithms are shown in FigfT] (a). We can see that the three BSBL 
algorithms had the best performance, and the phase transition curves of BSBL-EM and BSBL-BO were 
identical. It is worth noting that when 5 > 0.15, BSBL-£i exactly recovered block sparse signals with 
p = 1 with a high success rate (> 99%). 

The results become more interesting when the intra-block correlation was 0.95 (that is, S gen = 
Toeplitz([l, 0.95, • • • ,0.95 24 ])). The empirical phase transition curves are shown in FigfT] (b), where 
all the three BSBL algorithms had improved performance. BSBL-^i exactly recovered sparse signals 
with p = 1 even for S < 0.15. BSBL-EM and BSBL-BO could exactly recover sparse signals with p = 1 
when 5 > 0.25. In contrast, all the four non-BSBL algorithms showed little change in performance when 
the intra-block correlation changed from to 0.95. 

These results are very interesting and surprising, since this may be the first time that an algorithm 
shows the ability to recover a block sparse signal of M nonzero elements from M measurements with 
a high success rate (> 99%). Obviously, exploiting block structure and intra-block correlation plays a 
crucial role here, indicating the advantages of the BSBL framework. 

B. Benefit of Exploiting Intra-Block Correlation 

The above results suggest there is a benefit to exploiting intra-block correlation. To further clarify 
this, another noiseless experiment was carried out. The matrix <& was of the size 100 x 300. The signal 
consisted of 75 blocks with an identical size of 4 elements. Only 20 of the blocks were nonzero. All the 
nonzero blocks had the same intra-block correlation (generated as in Section IV-AI ) ranging from -0.99 to 
0.99. Different from the first experiment, each nonzero block was further normalized to unit £2 norm in 
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Fig. 2. (a) shows the benefit of exploiting the intra-block correlation, (b) shows the performance of BSBL-EM for three 
correlation conditions. 



order to remove the interference caused by different £2 norms of the blocks. 

BSBL-EM, BSBL-BO and BSBL-^i were applied with and without correlation exploitation. In the first 
case, they adaptively learned and exploited the intra-block correlation. In the second case, they ignored 
the correlation, i.e., fixing Bj = I(Vi). 

The results are shown in Figf2] (a). First, we see that exploiting the intra-block correlation greatly 
improved the performance of the BSBL algorithms. Second, when ignoring the intra-block correlation, 
the performance of the BSBL algorithms showed no obvious relation to the correlation |^. In other 
words, no obvious negative effect is observed if ignoring the intra-block correlation. Note that the second 
observation is quite different from the observation on temporal correlation in MMV models [11], where 
we found that if temporal correlation is not exploited, algorithms have poorer performance with increasing 

6 This phenomenon can also be observed from the performance of the compared algorithms in Section IV-AI where their 
performance had little change when intra-block correlation dramatically varied. 
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temporal correlation values Q 

In the previous experiment all the nonzero blocks had the same intra-block correlation. We might 
then ask whether the proposed algorithms can still succeed when the intra-block correlation for nonzero 
blocks is not homogenous. To answer this question, we considered three possible cases for choosing 
intra-block correlation for each nonzero block: (1) correlation values were chosen uniformly from -1 to 
1; (2) correlation values were chosen uniformly from to 1; (3) correlation values were chosen uniformly 
from 0.7 to 1. 

BSBL-EM was then applied with and without correlation exploitation, as described in the previous 
experiment. The results are shown in Fig|2] (b), with the three correlation cases indicated by 'Case 1', 
'Case 2', and 'Case 3', respectively. We can see in Case 3 (least variation in intra-block correlation) 
the benefit of exploiting the correlation was significant, while in Case 1 (most variation in intra-block 
correlation) the benefit disappeared (but exploiting the correlation was not harmful). However, Case 1 
rarely happens in practice. In most practical problems the intra-block correlation values of all nonzero 
blocks tends to be positive and high, which corresponds to Case 2 and Case 3. 

C. Performance in Noisy Environments 

We compared the BSBL algorithms, Mixed £2/^1 Program, Group Lasso, and Group Basis Pursuit 
at different noise levels. In this experiment M = 128 and N = 512. The generated block sparse 
signal was partitioned into 64 blocks with an identical block size of 8 elements. Seven blocks were 
nonzero, generated as in Section IV-AI The intra-block correlation value for each block was uniformly 
randomly varied from 0.8 to 1. Gaussian white noise was added so that the SNR, defined by SNR(dB) = 
201og 10 (||*&Xg Cn ||2/||v||2), ranged from 5 dB to 25 dB for each generated signal. As a benchmark result, 
the 'oracle' result was calculated, which was the least-square estimate of x gcn given its true support. 

The results are shown in Figj3](a). All three BSBL algorithms exhibited significant performance gains 
over non-BSBL algorithms. In particular, the performance curves of BSBL-EM and BSBL-BO were 
nearly identical to that of the 'oracle'. The phenomenon that BSBL-£i had slightly poorer performance 

7 The temporal correlation in an MMV model can be viewed as the intra-block correlation in the vectorized MMV model 
(which is a block sparse model). However, it should be noted that the sensing matrix in the vectorized MMV model has the 
specific structure <l? ® II [11], where <1> is the sensing matrix in the original MMV model, ® indicates the Rronecker product, 
II is the identity matrix with the dimension L x L, and L is the number of measurement vectors in the MMV model. This 
structure is not present in the block sparsity model considered in this work and is believed to account for the different behavior 
with respect to the correlation. 
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at low SNR and high SNR situations is due to some sub-optimal default parameters in the Group Basis 
Pursuit program [5]. We found the phenomenon disappeared when using other software. Figure [3] (b) 
gives the speed comparison of the three algorithms on a computer with dual-core 2.8 GHz CPU, 6.0 GiB 
RAM, and Windows 7 OS. It shows BSBL-^i was the fastest due to the use of Group Basis Pursuit in 
its inner loop. 

D. Performance When Block Partition Is Unknown 

We set up a noisy experiment to compared all of our algorithms with StructOMP (given the number of 
nonzero elements), BM-MAP-OMP (given the true noise variance), and CluSS-MCMC, under conditions 
where the block partitioning is unknown. The matrix 3> was of the size 192 x 512. The signal x gen 
contained go nonzero blocks with random size and random locations (not overlapping), go was varied 
from 2 to 10. The total number of nonzero elements in x gon was fixed to 48. The intra-block correlation 
value for each block uniformly randomly varied from 0.8 to 1. SNR was 15 dB. As we stated in Section 
HVl knowledge of the block size h is not crucial in practical use. To empirically evaluate this, we calculated 
performance curves for all our algorithms using fixed values of h = 4 and h = 8. The results are shown 
in Figf4] To improve figure readability, we only display BSBL-EM and EBSBL-BO. We also applied T- 
MSBL [11] here. Note that when T-MSBL is used for the block sparse signal recovery problem (Q]), it can 
be viewed as a special case of BSBL-EM with h = 1. The results show that our algorithms outperformed 
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Fig. 4. Performance comparison when block partition was unknown. 



StructOMP, CluSS-MCMC, and BM-MAP-OMP, and that for both BSBL-EM and EBSBL-BO, setting 
h = 4 or h = 8 led to similar performance. 

VI. Conclusion 

Using the block sparse Bayesian learning framework and its extension, we proposed a number of 
algorithms to recover block sparse signals when the block structure is known or unknown. These 
algorithms have the ability to explore and exploit intra-block correlation in signals and thereby improve 
performance. We experimentally demonstrated that these algorithms significantly outperform existing 
algorithms. The derived algorithms also suggest that the iterative reweighted framework is a promising 
method for Group-Lasso type algorithms to exploit intra-block correlation. 
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